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We revisit the matrix problems sparse null space and matrix sparsification, and show that they are equiv- 
alent. We then proceed to seek algorithms for these problems: We prove the hardness of approximation 
as : of these problems, and also give a powerful tool to extend algorithms and heuristics for sparse approxi- 

' mation theory to these problems. 
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1 Introduction 

In this paper, we revisit the matrix problems sparse null space and matrix sparsification. 



The sparse null space problem was first considered by Pothen in 1984 [27J. The problem asks, given a matrix 
A, to find a matrix N that is a full null matrix for A - that is, N is full rank and the columns of N span 
the null space of A. Further, N should be sparse, i.e. contain as few nonzero values as possible. The sparse 
null space problem is motivated by its use to solve Linear Equality Problems (LEPs) [SJ. LEPs arise in 
the solution of constrained optimization problems via generalized gradient descent, segmented Lagrangian, 
and projected Lagrangian methods. Berry et al. [4] consider the sparse null space problem in the context 
of the dual variable method for the Navier-Stokes equations, or more generally in the context of null space 
methods for quadratic programming. Gilbert and Heath [16] noted that among the numerous applications 
of the sparse null space problem arising in solutions of underdetermined system of linear equations, is 
the efficient solution to the force method (or flexibility method) for structural analysis, which uses the 
null space to create multiple linear systems. Finding a sparse null space will decrease the run time and 
memory required for solving these systems. More recently, it was shown [36, [26] that the sparse null space 
problem can be used to find correlations between small numbers of times series, such as financial stocks. 
The decision version of the sparse null space problem is known to be NP-Complete [9], and only heuristic 
solutions have been suggested for the minimization problem [9| I16| Hj. 

The matrix sparsification problem is of the same flavor as sparse null space. One is given a full rank matrix 
A, and the task is to find another matrix B that is equivalent to A under elementary column operations, 
and contains as few nonzero values as possible. Many fundamental matrix operations are greatly simplified 
by first sparsifying a matrix (see [12]) and the problem has applications in areas such as machine learning 
|30j and in discovering cycle bases of graphs [20J. But there seem to be only a small number of heuristics 
for matrix sparsification ([7] for example), or algorithms under limiting assumptions ([T7] considers matrices 
that satisfy the Haar condition), but no general approximation algorithms. McCormick [22] established 
that the decision version of this problem is NP-Complete. 
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For these two classic problems, we wish to investigate potentials and limits of approximation algorithms 
both for the general problems and for some variants under simplifying assumptions. To this end, we will 
need to consider the well-known vector problems min unsatisfy and exact dictionary representation (elsewhere 
called the sparse approximation or highly nonlinear approximation problem [32J). 

The min unsatisfy problem is an intuitive problem on linear equations. Given a system Ax = b of linear 
equations (where A is an integer m x n matrix and b is an integer m- vector), the problem is to provide 
a rational n-vector x; the measure to be minimized is the number of equations not satisfied by Ax = b. 
The term "min unsatisfy" was first coined by Arora et al. [2j in a seminal paper on the hardness of 
approximation, but they claim that the the NP-Completeness of the decision version of this problem 
is implicit in a 1978 paper of Johnson and Preparata [18]. Arora et al. demonstrated that it is hard to 
approximate min unsatisfy to within a factor 2 log 5 11 of optimal (under the assumption that NP does not 
admit a quasi-polynomial time deterministic algorithm). This hardness result holds over Q, and stronger 
results are known for finite fields [10]. For this problem, Berman and Karpinski [3] gave a randomized 
c log m -approximation algorithm (where c is a constant). We know of no heuristics studied for this problem. 

The exact dictionary representation problem is the fundamental problem in sparse approximation theory 
(see [23 1). In this problem, we are given a matrix of dictionary vectors D and a target vector s, and the 
task is to find the smallest set D' C D such that a linear combination of the vectors of D' is equal to 
s. This problem and its variants have been well studied. According to Temlyakov [31] . a variant of this 
problem may be found as early as 1907, in a paper of Schmidt |28| . The decision version of this problem 
was shown to be NP-Complete by Natarajan [24] . (See [21] for further discussion.) 

The field of sparse approximation theory has become exceedingly popular: For example, SPAR05 was 
largely devoted to it, as was the SparseLand 2006 workshop at Princeton, and a mini-symposium at NYU's 
Courant Institute in 2007. The applications of sparse approximation theory include signal representation 
and recovery [8j [25] , amplitude optimization [29] and function approximation [24] . When the dictionary 
vectors are Fourier coefficients, this problem is a classic problem in Fourier analysis, with applications 
in data compression, feature extraction, locating approximate periods and similar data mining problems 
[37 \ 114 1 ITS" } IS]. There is a host of results for this problem, though all are heuristics or approximations under 
some qualifying assumptions. In fact, Amaldi and Kann [I] showed that this problem (they called it RVLS 
- 'relevant variables in the linear system') is as hard to approximate as min unsatisfy, though their result 
seems to have escaped the notice of the sparse approximation theory community. 

Our contribution. As a first step, we note that the matrix problems sparse null space and matrix 
sparsification are equivalent, and that the vector problems min unsatisfy and exact dictionary representation 
are equivalent as well. Note that although these equivalences are straightforward, they seem to have escaped 
researchers in this field. For example, [5] claimed that the sparse null space problem is computationally 
more difficult than matrix sparsification.) 

We then proceed to show that matrix sparsification is hard to approximate, via a reduction from min 
unsatisfy. We will thereby show that the two matrix problems are hard to approximate within a factor 
2 iog- -° l, n of pti ma l 

(assuming NP does not admit quasi-polynomial time deterministic algorithms). 

This hardness result for matrix sparsification is important in its own right, but it further leads us to ask 
what can be done for this problem. Specifically, what restrictions or simplifying assumptions may be 
made upon the input matrix to make matrix sparsification problem tractable? In addressing this question, 
we provide the major contribution of this paper and show how to adapt the vast number of heuristics 
and algorithms for exact dictionary representation to solve matrix sparsification (and hence sparse null space 
as well). This allows us to conclude, for example, that matrix sparsification admits a randomized c w m ~ 
approximation algorithm, and also to give limiting conditions under which a known l\ relaxation scheme 
for exact dictionary matching solves matrix sparsification exactly. Our results also carry over to relaxed 
version of these problems, where the input is extended by an error term 5 which relaxes a constraint. 
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These versions are denned in the appendix ( CBj) . although we omit the proof of their equivalence. All of 
our results assume that the vector variables are over Q. 

An outline of our paper follows: In Section [2] we review some linear algebra and introduce notation. In 
closing the preliminary section (|2.3p . we prove equivalences between the two matrix problems and the two 
vector problems. In Section [3] we prove that matrix sparsification is hard to approximate, and in Section [J] 
we show how to adapt algorithms for exact dictionary representation to solve matrix sparsification. 

2 Preliminaries 

In this section we review some linear algebra, introduce notation and definitions, and formally state our 
four problems. 

2.1 Linear algebra and notation. 

Matrix and vector properties. Given a set V of n m-dimensional column vectors, an m-vector v £ V 
is independent of the vectors of V if there is no linear combination of vectors in V that equals v. A set of 
vectors is independent if each vector in the set is independent of the rest. 

Now let the vectors of V be arranged as columns of an m x n matrix A; we refer to a column of A as <Zj, 
and to a position in A as <ta. We define #col(^4) to be the number of columns of A. The column span 
of A (col(j4)) is the (infinite) set of column vectors that can be produced by a linear combination of the 
columns of A. The column rank of A is the dimension of the column space of A (rank(A) = dim(col(^4))); 
it is the size of the maximal independent subset in the columns of A. If the column rank of A is equal to 
n, then the columns of A are independent, and A is said to be full rank. 

Other matrices may be produced from A using elementary column operations. These include multiplying 
columns by a nonzero factor, interchanging columns, and adding a multiple of one column to another. 
These operations produce a matrix A' which has the same column span as A; we say A and A' are column 
equivalent. It can be shown that A, A' are column equivalent iff A' = AX for some invertible matrix X. 

Let R be a set of rows of A, and C be a set of columns. A(R, C) is the submatrix of A restricted to R and 
C. Let A(:, C) (A(R, :)) be the submatrix of A restricted to all rows of A and to columns in C (restricted to 
the rows of R and all columns in A). A square matrix is an m X m matrix. A square matrix is nonsingular 
if it is invertible. 

Null space. The null space (or kernel) of A (null(A)) is the set of all nonzero n- length vectors b for which 
Ab = 0. The rank of A's null space is called the corank of A. The rank-nullity theorem states that for any 
matrix A, rank(^4)+ corank(j4) = n. Let N be a matrix consisting of column vectors in the null space of 
A; we have that AN = 0. If the rank of N is equal to the corank of A then N is a full null matrix for A. 

Given matrix A, a full null matrix for A can be constructed in polynomial time. Similarly, given a full 
rank matrix N, polynomial time is required to construct a matrix A for which N is a full null matrix |26j . 

Notation. Throughout this paper, we will be interested in the number of zero and nonzero entries in a 
matrix A. Let nnz(A) denote the number of nonzero entries in A. For a vector x, let ||x||o denote the 
number of nonzero entries in x. This notation refers to the quasi- norm £q, which is not a true norm since 
A||x||o 7^ || Ax ||o, although it does honor the triangle inequality. 

For vector x, let Xi be the value of the i th position in x. The support of x (supp(x)) is the set of indices in 
x which correspond to nonzero values, i € supp{x) 44> x% ^ 0. 
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The notation A\B indicates that the rows of matrix B are concatenated to the rows of matrix A. The 
notation f^) indicates that the columns of B are appended to the columns of A. M = A® B denotes the 
Kronecker product of two matrices, where M is formed by multiplying each individual entry in A by the 
entire matrix B. (If A is m x n, B is p x g, then M is mp x nq.) 

By equivalent problems, we mean that reductions between them preserve approximation factors. A formal 
definition of approximation equivalence is found in Appendix [Al 



2.2 Minimization problems 

In this section, we formally state the four major minimization problems discussed in this paper. The 
first two problems have vector solutions, and the second two problems have matrix solutions. Our results 
hold when the variables are over Q, although these problems can be defined over R. Xp is the set of 
input instances, Sp(x) is the solution space for x G Xp, Mp(x,y) is the objective metric for x £ Ip and 
y e S F (x). 

exact dictionary representation (EDR) 

Iedr = (D, s), m x n matrix D, vector s with s € co\(D) 
S EDR {D,s) = {v£Q n :Dv = s} 
meoR({D,s),v) = \\v\\ 

min unsatisfy (MU) 

^mu = {A, y), m x n matrix A, vector y £ Q m 
S M(J (A,y) = {x:xeQ n } 
m M \j((A,y),x) = \\y - Ax\\ 

sparse null space (SNS) 

Zsns = matrix A 

Ssns(A) = {N : N is a full null matrix for A} 
m stiS (A,N) = nnz(iV) 

matrix sparsification (MS) 

Zms = full rank m x n matrix B 

Sms(B) = {matrix N : N = BX for some invertible matrix X} 
m MS (B,N) = nnz(A^) 



2.3 Equivalences 

In closing the preliminary section, we show that min unsatisfy and exact dictionary representation are equiv- 
alent. We then show that matrix sparsification and matrix sparsification are equivalent. The type of equiv- 
alence is formally stated in definition fT4l and guarantees exact equality of approximation factors among 
polynomial-time algorithms (Corollary I16p . 



Equivalence of vector problems. Here we show that EDR and min unsatisfy are equivalent. 

We reduce EDR to min unsatisfy. Given input (D,s) to EDR, we seek a vector v with minimum ||u||o that 
satisfies Dv = s. Let y be any vector that satisfies Dy = s, and A be a full null matrix for D. (These can 
be derived in polynomial time.) Let x = MU (A, y) and v = y — Ax. We claim that v is a solution to EDR. 
First note that v satisfies Dv = s: Dv = D{y — Ax) = Dy — DAx = s — = s. Now, the call to MU(A, y) 
returned a vector x for which \\y — ^4x||o = IMIo is the minimization measure; and, as x ranges over R n , 
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the vector v = y — Ax ranges over all vectors with Dv = s. Hence, the oracle for min unsatisfy directly 
minimizes ||u||q, and so v is a solution to EDR. 

We now reduce min unsatisfy to EDR. Given input {A, y) to min unsatisfy, we seek a vector x which minimizes 
| \y— Ax\ |o- We may assume that A is full rank. (Otherwise, we can simply take any matrix A whose columns 
form a basis of co\(A), and it follows easily that \\M U(A, y)\\ = \\MU(A,y)\\.) Find (in polynomial time) 
a matrix D such that A is a full null matrix for D (this can be achieved by finding D T as a null matrix of 
A T ). Let s = Dy, and v = EDR(D, s). Since Dv = s we have that D(y — v) = Dy — Dv = 0, from which 
we conclude that y — v is in the null space of D, and therefore in the column space of A. It follows that 
we can find an x such that Ax = y — v. We claim that x solves the instance of min unsatisfy: It suffices to 
note that the call to EDR(D, s) minimizes |H|o = \ \y — Ax\\o, and that as v ranges over {v : Dv = s}, the 
vector Ax = y — v ranges over all of col (A). In conclusion, 

Lemma 1 The problems exact dictionary representation and min unsatisfy are equivalent. 

Equivalence of matrix problems. Here we demonstrate that sparse null space and matrix sparsification 

are equivalent. Recall that in the description of matrix sparsification on input matrix B, we required that 
B be full rank, #col(l?) = rank(S). (We could in fact allow #co\(B) > rank(i?), but this would trivially 
result in #co\{B) — rank(£>) zero columns in the solution, and these columns are not interesting.) We will 
need the following lemma: 

Lemma 2 Let B be a full null matrix for m x n matrix A. The following statements are equivalent: (1) 
N = BX for some invertible matrix X. (2) N is a full null matrix for A. 

Proof. In both cases, N and B must have the same number of columns, the same rank, and the same 
span. This is all that is required to demonstrate either direction. □ 

We can now prove that sparse null space and matrix sparsification are equivalent. The problem sparse null 
space may be solved utilizing an oracle for matrix sparsification. Given input A to sparse null space, create 
(in polynomial time) a matrix B which is a full null matrix for A, and let N = MS(B). We claim that N 
is a solution to SNS(A). Since N = BX for some invertible matrix X, by Lemma[2] N is a full null matrix 
for A. Therefore the call to MS(-B) is equivalent to a call to MS(iV), which solves sparse null space on A. 

We show that matrix sparsification can be solved using an oracle for sparse null space. Given input B to 
matrix sparsification, create (in polynomial time) matrix A such that B is a full null matrix for A. Let 
N = SNS(A). We claim that N is a solution to MS(-B). By the lemma, N = BX for some invertible matrix 
X, so N can be derived from B via elementary row reductions. The call to SNS(A) finds an optimally 
sparse N, which is equivalent to solving min unsatisfy on B. In conclusion, 

Lemma 3 The problems matrix sparsification and sparse null space are equivalent. 

3 Hardness of approximation for matrix problems 

In this section, we prove the hardness of approximation of matrix sparsification (and therefore sparse null 
space). This motivates the search for heuristics or algorithms under simplifying assumptions for matrix 
sparsification, which we undertake in the next section. For the reduction, we will need a relatively dense 
matrix which we know cannot be further sparsified. We will prove the existence of such a matrix in the 
first subsection. 
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3.1 Unsparsifiable matrices 



Any mxn matrix A may be column reduced to contain at most (m — r + l)r nonzeros, where r = rank(^4). 
For example, Gaussian elimination on the columns of the matrix will accomplish this sparsification. We 
will say that a rank r, m x n matrix A is completely unsparsifiable if and only if, for any invertible matrix 
X, nnz(ylX) > (m — r + l)r. A matrix A is optimally sparse if, for any invertible X, r\nz(AX) > nnz(yl). 
The main result of this section follows. 

Theorem 4 Let A be an m x n matrix with m>n. If every square submatrix of A is nonsingular, then 
A has rank n and is completely unsparsifiable. Moreover, in such case the matrix is optimally sparse, 
where I is the n x n identity matrix. 

Before attempting a proof of the theorem, we need a few intermediate results. 

Lemma 5 Matrix A is optimally sparse if and only if, for any vector i/O, ||^||o > ma;x i£supp{x) ll a «llo- 

Proof. Suppose that there exists an x that, for some i € supp(x), j|Ax||o < ||ai||o- Then we may replace 
the matrix column aj by Ax, and create a matrix with the same rank as A which is sparser than A; 
a contradiction. Similarly, suppose that A is not optimally sparse, so that there exists B = AX with 
nnz(.B) < nnz(j4), for some invertible X. Assume without loss of generality that the diagonal of X is full, 
xu 7^ (otherwise just permute the columns of X to make it so). Then there must exist an index j £ [n] 
with ||6j||o < ||oj||(b and we have ||j4xj||o = \\bj\\o < ||oj||o < max iesupp(a; j ) ll a «Ho> since Xjj ^ 0. □ 

A submatrix A(R, C) is row-inclusive iff r ^ R implies that A(r,C) is not in the row span of A(R,C). 
In other words, A(R,C) includes all the rows of A{:,C) which are in the row span of this submatrix. A 
submatrix A(R, C) is a candidate submatrix of A (written A(R, C) <\ A) if and only if A(R, C) is both 
row-inclusive and rank(yl(i?, C)) = \C\ — 1. This last property is equivalent to stating that the columns of 
A(R, C) form a circuit - they are minimally linearly dependent. We can potentially zero out \R\ entries 
of A by using the column dependency of A(R,C); being row-inclusive means there would be exactly \R\ 
zeros in the modified column of A. 

The next lemma demonstrates the close relationship between candidate submatrices and vectors x which 
may sparsify A as in Lemma [5l 

Lemma 6 For any mxn matrix A: (1) For any i / and i 6 supp{x), there exists A(R,C) < A for 
which \R\ > m — \ \Ax\\q, and i € C C supp(x). (2) For any A(R, C) <\ A there exists a vector x for which 
supp(x) = C and ||^4x||o = m — \R\. 

Proof. Part 1: Let R' = [m] — supp(^4x) (where [m] = {1, 2, m}), and choose C so that i £ C C 
supp(x), and the columns of A(R', C) form a circuit. (Note that the columns A(R' , supp(x)) are dependent 
since A(R',:)x = 0). Now expand R' to R so that A{R,C) is row-inclusive. Then rank(^4(i?, C)) = 
rank(^( J R / , C)) = \C\ - 1, so that A{R, C) < A. 

Part 2: Since the columns of A(R, C) form a circuit, there is an x with Xi ^ Vi and A(R, C)x = 0. Then 
dim(co\(A(R,C) T )) = \C\ — 1 = dim(null(x T )) and also co\(A(R,C) T ) C null(x T ), which together imply 
co\(A(R, C) T ) = null(x T ). So A(r,C)x = is true iff r G R (using the fact that A(R,C) is row-inclusive). 
Now choose x so that x(C) = x and all other coordinates are zero; then supp(^4x) = [m] — R. □ 

The following is an immediate consequence of the lemma, and is crucial to our proof of Theorem HJ 
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Corollary 7 The m x n matrix A is optimally sparse if and only if there is no candidate submatrix 
A(R, C) <\ A with m — \R\ < ||ai||o for some i G C. 

We are now ready to prove the theorem. 

Proof of Theorem g]. Let B = Q . We prove that B is optimally sparse. Suppose B(R,C) <l B. 
Let Rj = R n [n] and i?^ = R — [n]. Now B(Rj,C) is a submatrix of / with dependent columns, so 
B(Rj,C) = 0. By row-inclusiveness, Ri must include all zero rows in B([n],C), so \Rj\ = n — \C\. Since 
B(Ri, C) = 0, it follows that rank(B(R A , C)) = rank(B( J R, C)) = |C| - 1, and > |C| - 1. Any |C| x |C| 
subsquare of B(Ra,C) would make the rank at least |C|, so we must have \Ra\ < \C\; thus \Ra\ = |C| — 1. 
Combined with \Rj\ = n — \C\, this implies that \R\ = n — 1. Then m + n— = m + 1 = ||6i||o for any 
column bi of 5, proving that B is optimally sparse by corollary [71 

Recall that Gaussian elimination on matrix A — > G yields nnz(G) = (m — n + l)n. Now suppose there 
is an invertible matrix X with nnz(AX) < (m — n + l)n. Then nnz(BX) = nnz((^)) < n 2 + (m — n + 
l)n = (m + l)n, contradicting the optimal sparsity of -B. Hence no such X exists and A is completely 
unsparsifiable. □ 

3.2 Efficiently building an unsparsifiable matrix 

The next lemma establishes that we can easily construct an unsparsifiable matrix with a given column, a 
useful fact for the reductions to follow. 

Lemma 8 If n x n matrix M = (Mij) has entries = i Pj for distinct positive reals pi,P2, ■ ■ ■ ,Pn> then 
every subsquare of M is nonsingular. 

Proof. Let / be a signomial (a polynomial allowed to have nonintegral exponents). We define positive_zeros(/) := 
{x : x > & f(x) = 0} and #sign_changes(/) := #{z : Hifii+i < 0}, where / = Yli M« xP S an d no /Xj = 0. A 
slight generalization of Descartes' rule of signs |35] states that 

#positive_zeros(/) < #sign_changes(/) (1) 

Consider any k x k subsquare M(R, C) given by R = {ri, . . . , r^}, C = {c±, . . . , c^} C [n], and any nonzero 
vector /x G M fc . Then M(R,C) ■ \i matches the signomial f(x) = ^2fiiX Pc i evaluated at x = ri,...,rfc. 
Using (P), #{i : /(rj) = 0} < #sign_changes(/) < k, so that some /(r,) / 0, and M(R,C)/J, / 0. Hence 
the subsquare has a trivial kernel, and is nonsingular. □ 

To avoid problems of precision, we will choose powers of pj to be consecutive integers beginning at 0. This 
yields the Vandermonde matrix over Q. It can also be shown, by an elementary cardinality argument, that 
a random matrix (using a non-atomic distribution) is unsparsifiable with probability 1 over infinite fields. 
The above lemma avoids any probability and allows us to construct such a matrix as quickly as we can 
iterate over the entries. 

3.3 Reduction for matrix problems 

After proving the existence of an unsparsifiable matrix in the last section, we can now prove the hardness 
of approximation of matrix sparsification. We reduce min unsatisfy to matrix sparsification. Given an instance 
(A,y) of min unsatisfy, we create a matrix M such that matrix sparsification on M solves the instance of 
min unsatisfy. 
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Before describing the reduction, we outline the intuition behind it. We wish to create a matrix M with 
many copies of y and some copies of A. The number of copies of y should greatly outnumber the number 
of copies of A. The desired approximation bounds will be achieved by guaranteeing that M is composed 
mostly of zero entries and of copies of y. It follows that minimizing the number of nonzero entries in the 
matrix (solving matrix sparsification) will reduce to minimizing the number of nonzero entries in the copies 
of y by finding a sparse linear combination of y with some other dictionary vectors (solving min unsatisfy). 

The construction is as follows: Given an instance (A,y) of min unsatisfy (where A is an m x n matrix, 
y G R m , and q > p are free parameters), take an optimally sparse (p + q) x p matrix (^) as given by 
Lemma [8] and Theorem 2] (where I p is a p x p identity matrix), and create matrix Mi = (^) (g> y = (x^y) 
(of size {p + q)m x p). Further create matrix I q ® A (of size qm x qn), and take matrix (of size pm x qn) 
and form matrix M r = (^ ^) (of size (p + q)m x qn). Append M r to the right of Mi to create matrix 



M = MAM r of size (p + q)m x (p + qn). We can summarize this construction as M = ( f r 

" \F \F H J \X®y I q ® 



Mi is composed of p+pq m-length vectors, all corresponding to copies of y. M r is composed of qn m-length 
vectors, all corresponding copies of vectors in A. By choosing p = q = n 2 , we ensure that the term pq is 
larger than qn by a factor of n. Note that M now contains 0(n 3 ) columns. 

It follows that the number of zeros in M depends mostly on the number of zeros induced by a linear 
combination of dictionary vectors that include y. Because M; is unsparsifiable, vectors in the rows of 
Mi will not contribute to sparsifying other vectors in these rows; only vectors in M r (which are copies 
of the vectors of ^4) may sparsify vectors in Mi (which are copies of the vectors in y). It follows that an 
approximation to matrix sparsification will yield a similar approximation - within a factor of l+n~3 - to min 
unsatisfy, and that matrix sparsification is hard to approximate within a factor 2 log 5 ™ 1/3 = 2 lo s' 5 n 
of optimal (assuming NP does not admit quasi-polynomial time deterministic algorithms). 

4 Solving matrix sparsification through min unsatisfy 

In the previous section we showed that matrix sparsification is hard to approximate. This motivates the 
search for heuristics and algorithms under simplifying assumptions for matrix sparsification. In this section 
we show how to extend algorithms and heuristics for min unsatisfy to apply to matrix sparsification - and 
hence sparse null space - while preserving approximation guarantees. (Note that this result is distinct from 
the hardness result; neither one implies the other.) 

We first present an algorithm for matrix sparsification which is in essence identical to the one given by 
Coleman and Pothen [9] for sparse null space. The algorithm assumes the existence of an oracle for a 
problem we will call the sparsest independent vector problem. The algorithm makes a polynomial number 
of queries to this oracle, and yields an optimal solution to matrix sparsification. 

The sparsest independent vector problem takes full-rank input matrices A and B, where the columns of 
B are a contiguous set of right-most columns from A (informally, one could say that B is a suffix of A, 
in terms of columns). The output is the sparsest vector in the span of A but not in the span of B. For 
convenience, we add an extra output parameter — a column of A\B which can be replaced by the sparsest 
independent vector while preserving the span of A. More formally, sparsest independent vector is defined 
as follows. (See $A]for the definition of a problem instance.) 

sparsest independent vector (SIV) 

25iv = {A, B); A is an m x n full rank matrix with A = (C\B) for some non-empty 
matrix C. 




S S \v(A,B) = {a : a € col(A),a £ co\{B)} 
m S \v{(A,B),a) = nnz(a) 
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The following algorithm reduces matrix sparsification on an m x n input matrix A to making a polynomial 
number of queries to an oracle for sparsest independent vector: 

Algorithm Matrix_Sparsification(j4) 
B <- null 
for i = n to 1: 

(b i ,a j )=S\V(A,B) 

A^(A\{ aj }\bi) 

B^(h\B) 
return B 

This greedy algorithm sparsifies the matrix A by generating a new matrix B one column at a time. The 
first-added column (b n ) is the sparsest possible, and each subsequent column is the next sparsest. It is 
decidedly non-obvious why such a greedy algorithm would actually succeed; we refer the reader to [9j where 
it is proven that greedy algorithms yield an optimal result on matroids such as the set of vectors in col(yl). 
Our first contribution is in expanding the result of [9] as follows. 

Lemma 9 Let subroutine SIV in algorithm MatrixSparsification be a X- approximation oracle for sparse 
independent vector. Then the algorithm yields a X- approximation to matrix sparsification. 

Proof. Given m x n matrix A, suppose C exactly solves MS(^4), and that the columns c\, . . . , c n of C 
are sorted in decreasing order by number of nonzeros. Let Sj = ||cj||o; then s\ > S2 > . . . > s n . As already 
mentioned, given a true oracle to sparsest independent vector, algorithm Matrix_Sparsification would first 
discover a column with s n nonzeros, then a column with s n _i nonzeros, etc. 

Now suppose algorithm Matrix_Sparsification made calls to a A— approximation oracle for sparse independent 
vector. The first column generated by the algorithm, call it b n , will have at most Xs n nonzeros, since the 
optimal solution has s n nonzeros. The second column generated will have at most Xs n -i nonzeros, since 
the optimal solution to the call to SIV has no more than s n _i nonzeros: even if b n is suboptimal, it is true 
that at least one of c n or c n _i is an optimal solution to SIV(A, b n ). 

More generally, the i th column found by the algorithm has no more then Xsi nonzeros, since at least one of 
{c n , . . . ,Cj} is an optimal solution to the i th query to SIV. Thus we have nnz(B) = ^ ||foj||o < X^ll^illo = 
A nnz(C), and may conclude that the algorithm yields a A— approximation to matrix sparsification. □ 

It follows that in order to utilize the aforementioned algorithm for matrix sparsification, we need some 
algorithm for sparsest independent vector. This is in itself problematic, as the sparsest independent vector 
problem is hard to approximate - in fact, we will demonstrate later that sparsest independent vector is as 
hard to approximate as min unsatisfy. Hence, although we have extended the algorithm of [9] to make use 
of an approximation oracle for sparsest independent vector, the benefit of this algorithm remains unclear. 

To this end, we will show how to solve sparsest independent vector while making queries to an approximate 
oracle for min unsatisfy. This algorithm preserves the approximation ratio of the oracle. This implies that 
all algorithms for min unsatisfy immediately carry over to sparsest independent vector, and further that they 
carry over to matrix sparsification as well. This also implies a useful tool for applying heuristics for min 
satisfy to the other problems. 

The problem sparsest independent vector on input (A,B) asks to find the sparsest vector in the span of A 
but not in the span of B. It is not difficult to see that min unsatisfy solves a similar problem: Given a 
matrix A and target vector y not in the span of A, find the sparsest vector in the span of (A\y) but not 
in the span of A. Hence, if we query the oracle for min unsatisfy once for each vector a,- ^ col(-B), one of 
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these queries must return the solution for the sparsest independent vector problem. This discussion implies 
the following algorithm: 

Algorithm Sparse_Independent_Vector(^4, B) 
s m + 1 
for j = 1 to n: 

if a,j ^ col(B) : 
Aj <_ ^\{ aj } 

x <- MV(Aj,aj) 

if ||c'||o < s 
c^c'; s <— ||c||o; a ^— ctj 
return (c, a) 

Note that when this algorithm is given a A-approximate oracle for min unsatisfy, it yields a A-approximate 
algorithm for sparsest independent vector. (In this case, the approximation algorithm is valid over the field 
for which the oracle is valid.) 

We conclude this section by giving hardness results for sparsest independent vector by reduction from min 
unsatisfy; we show that any instance {A,b) of min unsatisfy may be modeled as an instance (A',B') of 
sparsest independent vector: Let A' = A\y, and B' = A. This suffices to force the linear combination to 
include y. It follows that sparsest independent vector is as hard to approximate as min unsatisfy, and in fact 
that the two problems are approximation equivalent. 

4.1 Approximation algorithms 

We have presented a tool for extending algorithms and heuristics for exact dictionary representation to min 
unsatisfy and then directly to the matrix problems. When these algorithms make assumptions on the 
dictionary of EDR, it is necessary to investigate how these assumptions carry over to the other problems. 

To this end, we consider here one of the most popular heuristic for EDR - ^-minimization - and the case 
where it is guaranteed to provide the optimal result. The heuristic is to find a vector v that satisfies 
Dv = s, while minimizing ||f||i instead of ||f||o- (See [341 133} [TT] for more details.) In [13], Fuchs shows 
that under the following relatively simple condition ^-minimization provides the optimal answer to EDR. 

In the following, we write sgn(x) to indicate A, or zero if x = 0. Given a matrix D whose columns are 
divided into two submatrices Dq and D\, we may write D = [D$ D\), even though Dq and D\ may not be 
contiguous portions of the full matrix. (The reader may view this as permuting the columns of D before 
splitting into Dq and D\.) 

Theorem 10 (Fuchs) Suppose that s = Dv, and that \\v\\q is minimal (so that this v solves EDR(D,s)). 
Split D = (Dq D\) so that Dq contains all the columns in the support of v. Accordingly, we split the vector 
v = (^9) , in which all coordinates of vq are nonzero. 

If there exists a vector h so that D^h = sgn(uo), and \\DjhWoc < 1, then \\v\\\ < \\w\\i for all vectors 
w 7^ v with Dw = s. 

We extend this result to each of our major problems. 
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Theorem 11 min unsatisfy. Suppose, for a given A,y pair, that x minimizes \ \y — Ax\\q. Split y = (^°) 
and A = so that A\ is maximal such that y\ = A\x, and let v = yo — Aqx. If there is a matrix u with 
IMloo < 1 and Aju = — AQSgn(v), then our reduction of MU(A,y) to an t\ approximation of EDR(D,s) 
gives the truly optimal answer. 

matrix sparsification. For a given m x n matrix B, suppose C minimizes nnz(C) such that C = BX for 
invertible X. For any i 6 [n], split column ci = ( C g°) so that c^o is completely nonzero, and, respectively, 

B = ( B l '°) 3 so that Cifl = BifiXi. If, for all i G [n], there exists vector Ui with ||tij||oo < 1 and Bj-^Ui = 

—Bf Q sgn(cifi), then our reduction algorithm to an l\ approximation of EDR via min unsatisfy will give a 
truly optimal answer to this MS instance. 

sparse null space For a given matrix A with corank c, suppose matrix V solves SNS(yl). For each i E [c\, split 
column Vi = ( so that Vi t o is completely nonzero and, respectively, A = (vl^o A^i) so that Ai^vifi = 0. 
If, for all i E [c], there exists vector hi with \ \Aj 1 hi\ |oo < 1 and Af Q hi = sgn(wj i o), then our reduction 
to an i\ approximation of EDR via matrix sparsification and min unsatisfy gives a truly optimal answer to 
this SNS instance. 

Proof, min unsatisfy. As in our reduction from MU to EDR, we find matrix D with DA = and vector 
s = Dy. Then 

( Sg ^ 0) ) G null(^) = col(^) 3h : D T h = 

Splitting D = (Do D\), we see that D^h = sgn(t>) and H-D^hHoo < 1, exactly what is required for theorem 
[T0| showing that l\ minimization gives the answer DqVq. Since DqVq = (Dq Di)(£) = D(y — Ax) = s, this 
completes the proof. 

matrix sparsification. We write A \ i to denote matrix A with the i th column removed. In our reduction of 
MS to MU, we need to solve instances of MU over equations of the form (B \ i)x = bi. According to the 
MU portion of this theorem, it suffices to show that (.B^i \ i) T Ui = — (-B^o \ i)- r sgn(cj i o). The condition for 
this portion of the theorem implies this, since removing any corresponding rows from a matrix equation of 
the form Ax = By still preserves the equality. 

sparse null space. As in our reduction from SNS to MS, we find a matrix B such that A is a full null matrix 
for B. For any i, let m = A T iX hi so that A T h t = ( Sgn ^' o) ). Then ( Sgn ^' o) ) G co\{A T ) = nu\\{B T ), and 
^Ti u i = ~B^fo s & n ( v i,o)i which is exactly what is necessary for matrix sparsification to function through l\ 
approximation. □ 

The following intuitive conditions give insight into which matrices are amenable to l\ approximations. A + 
denotes (A T A)~ 1 A T , the pseudoinverse of A. 

Corollary 12 min unsatisfy. Suppose matrix A = (^°) is split by an optimal answer as in theorem If 
row(Ao) C row(Ai) and | < 1, our £\ approximation scheme will give a truly optimal answer. 

matrix sparsification. Suppose matrix B = is split by the columns of an optimal answer C = BX as 

in theorem [771 //, for any i, row(Bi^) C row(Bi^) and \\(Bfi) + Bfo\\i,i < 1, then our l\ approximation 
will give the optimal answer. 

sparse null space. Suppose matrix A = (Ai t o ^i.l) *s split by the columns of an optimal answer V with 
AV = as in theorem\ll[ If col( Ai$) C co\(Ai^\) and WAf^Ai^Wi^ < 1 \/i, then our £\ approximation will 
give an optimal answer. 
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A Approximation equivalence 

Here we define approximation equivalence. Some of our notation and definitions are inspired by pQ, which 
itself built upon [19J. 



Definition 13 An optimization problem is a four-tuple F = {Xp, Sf, Mf, optp}, where Xp is the set of 

input instances, Sp(x) is the solution space for x G Xp, Mp(x,y) is the objective metric for x G Xp and 
y G Sp(x), and optp G {min,max} . 



We will assume throughout the paper that opt^ = min. 

For any optimization problem F and x G Xp, we define F{x) = argmm y&SF ^Mp(x,y) and ||F(x)|| = 
Mp(x,F{x)). An approximation F to F is any map on Xp with F{x) G Sf{x). We write ||-F(x)|| for 
Mp(x, F(x)). F is an X- approximation for F when, for all x G Xp, p^jj < MI X D- 

Definition 14 Given optimization problems F and G, an exact reduction from F to G is a pair (£1,^2) 
that satisfies the following: (1) t\,t2 G P. (2) t± : Xp — > Xq and for all x G Xp, y G Sc{t\{x)), we have 
t2(x,y) G Sf(x). (3) For all x G Xp, y G Scihix)), we have Mp(x,t2(x,y)) = MG{t\{x),y). (4) For all 
xGX F , \\F(x)\\ > ||G?(ti(x))||. 

We write F -< G. We write F ~ G to denote that F < G and G < F , and call these problems equivalent. 



Theorem 15 If F ■< G and G admits a A- approximation, then so does F. 



Proof. We are given that for G, there exists G with jjg^jj < A for all x G Xq. Let F(x) = t2(x, G(t\(x))), 

where (ii,^) is the F < G exact reduction. It suffices to show that npfeyji — , where x' = t\{x). 

By the fourth item of the definition, it suffices to demonstrate that ||-F(x)|| < [|G(a/)|[. By the third item, 
\\F{x)\\ = M F {xMx,G{x'))) = M G (x',G{x')) = \\G(x')\\. □ 



In fact, it can be shown that = jj^jryjj- 

Corollary 16 If F ~ G, then F admits a X- approximation if and only if G admits a X- approximation. 



B Relaxed Versions 

The following problems are variations which work with more approximate solution spaces. This can be 
considered as allowing some noise in either the inputs or outputs. It is not difficult to extend the above 
proofs to see that these four problems are equivalent as well. 
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Relaxed Dictionary Representation (RDR) 

Xrdr = (D, s, 5), m x n matrix D, vector s with s £ col(D), 5 > 
Srdr(-Dj 5) = {(i>, "w) each in IR n : D(v — w) = s, \ \w\\ < 5} 
m RDR ({D,s), (v,w)) = \\v\\ 

Relaxed MinUnsatisfy (RMU) 

Xrmu = (A, y,S), m x n matrix A, vector y <G M. m , 5 > 
S RMU {A,y,8) = {{x GR n ,wG R m ) : || w || < 6} 
m RM u((A,y}, {x,w}) = \ \y - Ax + w\\ 

Relaxed Sparse Null Space (RSNS) 

2rsns = {A, 5), matrix A, and 5 > 

«Srsns(A 8) = {{M, N) : N is a full null matrix for A, \\M\\ < 5} 
m R SNS«A 8), (M, AO) = nnz(M + N) 

Relaxed Matrix Sparsification (RMS) 

2rms = (B, 5), matrix B, and 8 > 

Srms(B,8) = {(M,N) : N = BX,X invertible, ||M|| < 5} 
m RMS ((B, 5), (M, N}) = nnz(M + N) 
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